Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 14 de 14
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
J Proteome Res ; 2024 Apr 10.
Artigo em Inglês | MEDLINE | ID: mdl-38594959

RESUMO

Reproducibility is a "proteomic dream" yet to be fully realized. A typical data analysis workflow utilizing extracted ion chromatograms (XICs) often treats the information path from identification to quantification as a one-way street. Here, we propose an XIC-centric approach in which the data flow is bidirectional: identifications are used to derive XICs whose information is in turn applied to validate the identifications. In this study, we employed liquid chromatography-mass spectrometry data from glycoprotein and human hair samples to illustrate the XIC-centric concept. At the core of this approach was XIC-based monoisotope repicking. Taking advantage of the intensity information for all detected isotopes across the whole range of an XIC peak significantly improved the accuracy and uncovered misidentifications originating from monoisotope assignment mistakes. It could also rescue non-top-ranked glycopeptide hits. Identification of glycopeptides is particularly susceptible to precursor mass errors for their low abundances, large masses, and glycans differing by 1 or 2 Da easily confused as isotopes. In addition, the XIC-centric strategy significantly reduced the problem of one XIC peak associated with multiple unique identifications, a source of quantitative irreproducibility. Taken together, the proposed approach can lead to improved identification and quantification accuracy and, ultimately, enhanced reproducibility in proteomic data analyses.

2.
J Proteome Res ; 23(4): 1443-1457, 2024 Apr 05.
Artigo em Inglês | MEDLINE | ID: mdl-38450643

RESUMO

We report the comparison of mass-spectral-based abundances of tryptic glycopeptides to fluorescence abundances of released labeled glycans and the effects of mass and charge state and in-source fragmentation on glycopeptide abundances. The primary glycoforms derived from Rituximab, NISTmAb, Evolocumab, and Infliximab were high-mannose and biantennary complex galactosylated and fucosylated N-glycans. Except for Evolocumab, in-source ions derived from the loss of HexNAc or HexNAc-Hex sugars are prominent for other therapeutic IgGs. After excluding in-source fragmentation of glycopeptide ions from the results, a linear correlation was observed between fluorescently labeled N-glycan and glycopeptide abundances over a dynamic range of 500. Different charge states of human IgG-derived glycopeptides containing a wider variety of abundant attached glycans were also investigated to examine the effects of the charge state on ion abundances. These revealed a linear dependence of glycopeptide abundance on the mass of the glycan with higher charge states favoring higher-mass glycans. Findings indicate that the mass spectrometry-based bottom-up approach can provide results as accurate as those of glycan release studies while revealing the origin of each attached glycan. These site-specific relative abundances are conveniently displayed and compared using previously described glycopeptide abundance distribution spectra "GADS" representations. Mass spectrometry data are available from the MAssIVE repository (MSV000093562).


Assuntos
Imunoglobulina G , Espectrometria de Massas em Tandem , Humanos , Glicosilação , Glicopeptídeos/análise , Polissacarídeos/química , Íons
3.
J Proteome Res ; 23(1): 409-417, 2024 Jan 05.
Artigo em Inglês | MEDLINE | ID: mdl-38009783

RESUMO

A fast and sensitive direct extraction (DE) method developed in our group can efficiently extract proteins in 30 min from a 5 cm-long hair strand. Previously, we coupled DE to downstream analysis using gel electrophoresis followed by in-gel digestion, which can be time-consuming. In searching for a better alternative, we found that a combination of DE with a bead-based method (SP3) can lead to significant improvements in protein discovery in human hair. Since SP3 is designed for general applications, we optimized it to process hair proteins following DE and compared it to several other in-solution digestion methods. Of particular concern are genetically variant peptides (GVPs), which can be used for human identification in forensic analysis. Here, we demonstrated improved GVP discovery with the DE and SP3 workflow, which was 3 times faster than the previous in-gel digestion method and required significantly less instrument time depending on the number of gel slices processed. Additionally, it led to an increased number of identified proteins and GVPs. Among the tested in-solution digestion methods, DE combined with SP3 showed the highest sequence coverage, with higher abundances of the identified peptides. This provides a significantly enhanced means for identifying proteins and GVPs in human hair.


Assuntos
Peptídeos , Proteínas , Humanos , Proteínas/análise , Peptídeos/análise , Eletroforese , Cabelo/química , Cabelo/metabolismo
4.
J Proteome Res ; 22(10): 3225-3241, 2023 Oct 06.
Artigo em Inglês | MEDLINE | ID: mdl-37647588

RESUMO

Glycopeptide Abundance Distribution Spectra (GADS) were recently introduced as a means of representing, storing, and comparing glycan profiles of intact glycopeptides. Here, using that representation, an extensive analysis is made of multiple commercial sources of the recombinant SARS-CoV-2 spike protein, each containing 22 N-linked glycan sites (sequons). Multiple proteases are used along with variable energy fragmentation followed by ion trap confirmation. This enables a detailed examination of the reproducibility of the method across multiple types of variability. These results show that GADS are consistent between replicates and laboratories for sufficiently abundant glycopeptides. Derived GADS enable the examination and comparison of the glycan profiles between commercial sources of the spike protein. Multiple distinct glycopeptide distributions, generated by multiple proteases, confirm these profiles. Comparisons of GADS derived from 11 sources of recombinant spike protein reveal that sources for which protein expression methods were the same produced near-identical glycan profiles, thereby demonstrating the ability of this method to measure GADS of sufficient reliability to distinguish different glycoform distributions between commercial vendors and potentially to reliably determine and compare differences in glycosylation for any glycoprotein under different conditions of production. All mass spectrometry data files have been deposited in the MassIVE repository under the identifier MSV000091776.

5.
J Proteome Res ; 20(3): 1612-1629, 2021 03 05.
Artigo em Inglês | MEDLINE | ID: mdl-33555887

RESUMO

This work presents methods for identifying and then creating a mass spectral library for disulfide-linked peptides originating from the NISTmAb, a reference material of the humanized IgG1k monoclonal antibody (RM 8671). Analyses involved both partially reduced and non-reduced samples under neutral and weakly basic conditions followed by nanoflow liquid chromatography tandem mass spectrometry (LC-MS/MS). Spectra of peptides containing disulfide bonds are identified by both MS1 ion and MS2 fragment ion data in order to completely map all the disulfide linkages in the NISTmAb. This led to the detection of 383 distinct disulfide-linked peptide ions, arising from fully tryptic cleavage, missed cleavage, irregular cleavage, complex Met/Trp oxidation mixtures, and metal adducts. Fragmentation features of disulfide bonds under low-energy collision dissociation were examined. These include (1) peptide bond cleavage leaving disulfide bonds intact; (2) disulfide bond cleavage, often leading to extensive fragmentation; and (3) double cleavage products resulting from breakages of two peptide bonds or both peptide and disulfide bonds. Automated annotation of various complex MS/MS fragments enabled the identification of disulfide-linked peptides with high confidence. Peptides containing each of the nine native disulfide bonds were identified along with 86 additional disulfide linkages arising from disulfide bond shuffling. The presence of shuffled disulfides was nearly completely abrogated by refining digest conditions. A curated spectral library of 702 disulfide-linked peptide spectra was created from this analysis and is publicly available for free download. Since all IgG1 antibodies have the same constant regions, the resulting library can be used as a tool for facile identification of "hard-to-find" disulfide-bonded peptides. Moreover, we show that one may identify such peptides originating from IgG1 proteins in human serum, thereby serving as a means of monitoring the completeness of protein reduction in proteomics studies. Data are available via ProteomeXchange with identifier PXD023358.


Assuntos
Peptídeos , Espectrometria de Massas em Tandem , Sequência de Aminoácidos , Cromatografia Líquida , Dissulfetos , Humanos
6.
Anal Chem ; 92(15): 10316-10326, 2020 08 04.
Artigo em Inglês | MEDLINE | ID: mdl-32639750

RESUMO

This study significantly expands both the scope and method of identification for construction of a previously reported tandem mass spectral library of 74 human milk oligosaccharides (HMOs) derived from results of combined LC-MS/MS experiments and comprehensive structural analysis of HMOs. In the present work, a hybrid search "bootstrap" identification method was employed that substantially broadens the coverage of milk oligosaccharides and thereby increases utility use of a spectrum library-based method for the rapid tentative identification of all distinguishable glycans in milk. This involved hybrid searching of the previous library, which was itself constructed using the hybrid search of oligosaccharide spectra in the NIST 17 Tandem MS Library. The general approach appears applicable to library construction of other classes of compounds. The coverage of oligosaccharides was significantly extended using milks from a variety of mammals, including bovine, Asian buffalo, African lion, and goat. This new method led to the identification of another 145 oligosaccharides, including an additional 80 HMOs from reanalysis of human milk. The newly identified compounds were added to a freely available mass spectral reference database of 219 milk oligosaccharides. We also provide suggestions to overcome several limitations and pitfalls in the interpretation of spectra of unknown oligosaccharides.


Assuntos
Mamíferos , Leite Humano/química , Leite/química , Oligossacarídeos/química , Bibliotecas de Moléculas Pequenas , Animais , Humanos , Especificidade da Espécie , Espectrometria de Massas em Tandem
7.
J Forensic Sci ; 65(2): 406-420, 2020 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-31670846

RESUMO

Recent reports have demonstrated that genetically variant peptides derived from human hair shaft proteins can be used to differentiate individuals of different biogeographic origins. We report a method involving direct extraction of hair shaft proteins more sensitive than previously published methods regarding GVP detection. It involves one step for protein extraction and was found to provide reproducible results. A detailed proteomic analysis of this data is presented that led to the following four results: (i) A peptide spectral library was created and made available for download. It contains all identified peptides from this work, including GVPs that, when appropriately expanded with diverse hair-derived peptides, can provide a routine, reliable, and sensitive means of analyzing hair digests; (ii) an analysis of artifact peptides arising from side reactions is also made using a new method for finding unexpected modifications; (iii) detailed analysis of the gel-based method employed clearly shows the high degree of cross-linking or protein association involved in hair digestion, with major GVPs eluting over a wide range of high molecular weights while others apparently arise from distinct non-cross-linked proteins; and (v) finally, we show that some of the specific GVP identifications depend on the sample preparation method.


Assuntos
Cabelo/metabolismo , Queratinas Específicas do Cabelo/metabolismo , Peptídeos/metabolismo , Proteoma/metabolismo , Artefatos , Cromatografia Líquida , Bases de Dados de Proteínas , Medicina Legal , Humanos , Masculino , Espectrometria de Massas , Proteômica , Reprodutibilidade dos Testes
8.
Bioinformatics ; 30(24): 3575-82, 2014 Dec 15.
Artigo em Inglês | MEDLINE | ID: mdl-25172925

RESUMO

MOTIVATION: The alignment of DNA sequences to proteins, allowing for frameshifts, is a classic method in sequence analysis. It can help identify pseudogenes (which accumulate mutations), analyze raw DNA and RNA sequence data (which may have frameshift sequencing errors), investigate ribosomal frameshifts, etc. Often, however, only ad hoc approximations or simulations are available to provide the statistical significance of a frameshift alignment score. RESULTS: We describe a method to estimate statistical significance of frameshift alignments, similar to classic BLAST statistics. (BLAST presently does not permit its alignments to include frameshifts.) We also illustrate the continuing usefulness of frameshift alignment with two 'post-genomic' applications: (i) when finding pseudogenes within the human genome, frameshift alignments show that most anciently conserved non-coding human elements are recent pseudogenes with conserved ancestral genes; and (ii) when analyzing metagenomic DNA reads from polluted soil, frameshift alignments show that most alignable metagenomic reads contain frameshifts, suggesting that metagenomic analysis needs to use frameshift alignment to derive accurate results.


Assuntos
Mutação da Fase de Leitura , Alinhamento de Sequência/métodos , Algoritmos , Interpretação Estatística de Dados , Genoma Humano , Genômica , Humanos , Metagenômica , Pseudogenes , Análise de Sequência de DNA , Análise de Sequência de Proteína , Análise de Sequência de RNA , Software
9.
Int J Bioinform Res Appl ; 10(4-5): 384-408, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-24989859

RESUMO

Some biological sequences contain subsequences of unusual composition; e.g. some proteins contain DNA binding domains, transmembrane regions and charged regions, and some DNA sequences contain repeats. The linear-time Ruzzo-Tompa (RT) algorithm finds subsequences of unusual composition, using a sequence of scores as input and the corresponding 'maximal segments' as output. In principle, permitting gaps in the output subsequences could improve sensitivity. Here, the input of the RT algorithm is generalised to a finite, totally ordered, weighted graph, so the algorithm locates paths of maximal weight through increasing but not necessarily adjacent vertices. By permitting the penalised deletion of unfavourable letters, the generalisation therefore includes gaps. The program RepWords, which finds inexact simple repeats in DNA, exemplifies the general concepts by out-performing a similar extant, ad hoc tool. With minimal programming effort, the generalised Ruzzo-Tompa algorithm could improve the performance of many programs for finding biological subsequences of unusual composition.


Assuntos
Algoritmos , Biologia Computacional/métodos , DNA/química , Animais , Genoma , Humanos , Linguagens de Programação , Ligação Proteica , Proteínas/química , Curva ROC , Alinhamento de Sequência , Software
10.
Nucleic Acids Res ; 41(1): e22, 2013 Jan 07.
Artigo em Inglês | MEDLINE | ID: mdl-23034809

RESUMO

Microsatellites (MSs) are DNA regions consisting of repeated short motif(s). MSs are linked to several diseases and have important biomedical applications. Thus, researchers have developed several computational tools to detect MSs. However, the currently available tools require adjusting many parameters, or depend on a list of motifs or on a library of known MSs. Therefore, two laboratories analyzing the same sequence with the same computational tool may obtain different results due to the user-adjustable parameters. Recent studies have indicated the need for a standard computational tool for detecting MSs. To this end, we applied machine-learning algorithms to develop a tool called MsDetector. The system is based on a hidden Markov model and a general linear model. The user is not obligated to optimize the parameters of MsDetector. Neither a list of motifs nor a library of known MSs is required. MsDetector is memory- and time-efficient. We applied MsDetector to several species. MsDetector located the majority of MSs found by other widely used tools. In addition, MsDetector identified novel MSs. Furthermore, the system has a very low false-positive rate resulting in a precision of up to 99%. MsDetector is expected to produce consistent results across studies analyzing the same sequence.


Assuntos
Repetições de Microssatélites , Análise de Sequência de DNA , Software , Animais , Arabidopsis/genética , Inteligência Artificial , Cromossomos/química , Drosophila melanogaster/genética , Genoma Humano , Genômica/métodos , Humanos , Mycobacterium tuberculosis/genética , Plasmodium falciparum/genética , Saccharomyces cerevisiae/genética
11.
Bioinformatics ; 26(14): 1708-13, 2010 Jul 15.
Artigo em Inglês | MEDLINE | ID: mdl-20505002

RESUMO

MOTIVATION: Since database retrieval is a fundamental operation, the measurement of retrieval efficacy is critical to progress in bioinformatics. This article points out some issues with current methods of measuring retrieval efficacy and suggests some improvements. In particular, many studies have used the pooled receiver operating characteristic for n irrelevant records (ROC(n)) score, the area under the ROC curve (AUC) of a 'pooled' ROC curve, truncated at n irrelevant records. Unfortunately, the pooled ROC(n) score does not faithfully reflect actual usage of retrieval algorithms. Additionally, a pooled ROC(n) score can be very sensitive to retrieval results from as little as a single query. METHODS: To replace the pooled ROC(n) score, we propose the Threshold Average Precision (TAP-k), a measure closely related to the well-known average precision in information retrieval, but reflecting the usage of E-values in bioinformatics. Furthermore, in addition to conditions previously given in the literature, we introduce three new criteria that an ideal measure of retrieval efficacy should satisfy. RESULTS: PSI-BLAST, GLOBAL, HMMER and RPS-BLAST provided examples of using the TAP-k and pooled ROC(n) scores to evaluate sequence retrieval algorithms. In particular, compelling examples using real data highlight the drawbacks of the pooled ROC(n) score, showing that it can produce evaluations skewing far from intuitive expectations. In contrast, the TAP-k satisfies most of the criteria desired in an ideal measure of retrieval efficacy. AVAILABILITY AND IMPLEMENTATION: The TAP-k web server and downloadable Perl script are freely available at http://www.ncbi.nlm.nih.gov/CBBresearch/Spouge/html.ncbi/tap/


Assuntos
Biologia Computacional/métodos , Armazenamento e Recuperação da Informação/métodos , Software , Bases de Dados Factuais , Internet , Curva ROC
12.
Nucleic Acids Res ; 36(18): 5863-71, 2008 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-18796526

RESUMO

Pairwise sequence alignment is a ubiquitous tool for inferring the evolution and function of DNA, RNA and protein sequences. It is therefore essential to identify alignments arising by chance alone, i.e. spurious alignments. On one hand, if an entire alignment is spurious, statistical techniques for identifying and eliminating it are well known. On the other hand, if only a part of the alignment is spurious, elimination is much more problematic. In practice, even the sizes and frequencies of spurious subalignments remain unknown. This article shows that some common scoring schemes tend to overextend alignments and generate spurious alignment flanks up to hundreds of base pairs/amino acids in length. In the UCSC genome database, e.g. spurious flanks probably comprise >18% of the human-fugu genome alignment. To evaluate the possibility that chance alone generated a particular flank on a particular pairwise alignment, we provide a simple 'overalignment' P-value. The overalignment P-value can identify spurious alignment flanks, thereby eliminating potentially misleading inferences about evolution and function. Moreover, by explicitly demonstrating the tradeoff between over- and under-alignment, our methods guide the rational choice of scoring schemes for various alignment tasks.


Assuntos
Alinhamento de Sequência/métodos , Animais , Biologia Computacional , Interpretação Estatística de Dados , Genômica , Humanos , Probabilidade
13.
Nucleic Acids Res ; 35(14): 4678-85, 2007.
Artigo em Inglês | MEDLINE | ID: mdl-17596268

RESUMO

The sequencing of complete genomes has created a pressing need for automated annotation of gene function. Because domains are the basic units of protein function and evolution, a gene can be annotated from a domain database by aligning domains to the corresponding protein sequence. Ideally, complete domains are aligned to protein subsequences, in a 'semi-global alignment'. Local alignment, which aligns pieces of domains to subsequences, is common in high-throughput annotation applications, however. It is a mature technique, with the heuristics and accurate E-values required for screening large databases and evaluating the screening results. Hidden Markov models (HMMs) provide an alternative theoretical framework for semi-global alignment, but their use is limited because they lack heuristic acceleration and accurate E-values. Our new tool, GLOBAL, overcomes some limitations of previous semi-global HMMs: it has accurate E-values and the possibility of the heuristic acceleration required for high-throughput applications. Moreover, according to a standard of truth based on protein structure, two semi-global HMM alignment tools (GLOBAL and HMMer) had comparable performance in identifying complete domains, but distinctly outperformed two tools based on local alignment. When searching for complete protein domains, therefore, GLOBAL avoids disadvantages commonly associated with HMMs, yet maintains their superior retrieval performance.


Assuntos
Estrutura Terciária de Proteína , Alinhamento de Sequência , Análise de Sequência de Proteína/métodos , Algoritmos , Sequência de Aminoácidos , Biologia Computacional/métodos , Sequência Conservada , Bases de Dados de Proteínas , Cadeias de Markov , Reprodutibilidade dos Testes , Software
14.
BMC Bioinformatics ; 7: 408, 2006 Sep 08.
Artigo em Inglês | MEDLINE | ID: mdl-16961919

RESUMO

BACKGROUND: Many DNA regulatory elements occur as multiple instances within a target promoter. Gibbs sampling programs for finding DNA regulatory elements de novo can be prohibitively slow in locating all instances of such an element in a sequence set. RESULTS: We describe an improvement to the A-GLAM computer program, which predicts regulatory elements within DNA sequences with Gibbs sampling. The improvement adds an optional "scanning step" after Gibbs sampling. Gibbs sampling produces a position specific scoring matrix (PSSM). The new scanning step resembles an iterative PSI-BLAST search based on the PSSM. First, it assigns an "individual score" to each subsequence of appropriate length within the input sequences using the initial PSSM. Second, it computes an E-value from each individual score, to assess the agreement between the corresponding subsequence and the PSSM. Third, it permits subsequences with E-values falling below a threshold to contribute to the underlying PSSM, which is then updated using the Bayesian calculus. A-GLAM iterates its scanning step to convergence, at which point no new subsequences contribute to the PSSM. After convergence, A-GLAM reports predicted regulatory elements within each sequence in order of increasing E-values, so users have a statistical evaluation of the predicted elements in a convenient presentation. Thus, although the Gibbs sampling step in A-GLAM finds at most one regulatory element per input sequence, the scanning step can now rapidly locate further instances of the element in each sequence. CONCLUSION: Datasets from experiments determining the binding sites of transcription factors were used to evaluate the improvement to A-GLAM. Typically, the datasets included several sequences containing multiple instances of a regulatory motif. The improvements to A-GLAM permitted it to predict the multiple instances.


Assuntos
Cadeias de Markov , Método de Monte Carlo , Elementos Reguladores de Transcrição/genética , Análise de Sequência de DNA/métodos , Animais , Sequência de Bases , Drosophila/genética , Dados de Sequência Molecular , Saccharomyces/genética , Alinhamento de Sequência/métodos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...